Average Number of Frequent and Closed Patterns in Random Databases

نویسندگان

  • Loïck Lhote
  • François Rioult
  • Arnaud Soulet
چکیده

Résumé : Frequent and closed patterns are at the core of numerous Knowledge Discovery processes. Their mining is known to be difficult, because of the huge size of the search space, exponentially growing with the number of attributes. Unfortunately, most studies about pattern mining do not address the difficulty of the task, and provide their own algorithm. In this paper, we propose some new results about the average number of frequent patterns, by using probabilistic techniques and we extend these results to the number of closed patterns. In a first step, the probabilistic model is simple and far from the real life since the attributes and the objects are considered independent. Nevertheless according to this model, frequency threshold phenomena observed in practice are explained. We also prove that, for a fixed threshold, the number of frequent patterns is asymptotically exponential in the number of attributes and polynomial in the number of objects whereas, for a frequency threshold proportional to the number of objects, the number of frequent and closed patterns is asymptotically polynomial in the number of attributes without depending on the number of objects. Mots-clés : data mining, average analysis, frequent and closed patterns

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases

The class of closed patterns is a well known condensed representations of frequent patterns, and have recently attracted considerable interest. In this paper, we propose an efficient algorithm LCM (Linear time Closed pattern Miner) for mining frequent closed patterns from large transaction databases. The main theoretical contribution is our proposed prefix-preserving closure extension of closed...

متن کامل

ClosedPROWL: Efficient Mining of Closed Frequent Continuities by Projected Window List Technology

Mining frequent patterns in databases is a fundamental and essential problem in data mining research. A continuity is a kind of causal relationship which describes a definite temporal factor with exact position between the records. Since continuities break the boundaries of records, the number of potential patterns will increase drastically. An alternative approach is to mine closed frequent co...

متن کامل

Evaluating the efficiency of Iranian industrial universities based on non-parametric and parametric approaches

The present study is the efficiency of Iranian industrial universities using non-parametric methods of data envelopment analysis and random border analysis parameter for input variables (number of incoming students, number of faculty members, number of staff and budget) and output (specific income, Has evaluated the number of students studying, the number of graduates and conference papers) and...

متن کامل

CISpan: Comprehensive Incremental Mining Algorithms of Closed Sequential Patterns for Multi-Versional Software Mining

Recently, frequent sequential pattern mining algorithms have been widely used in software engineering field to mine various source code or specification patterns. In practice, software evolves from one version to another in its life span. The effort of mining frequent sequential patterns across multiple versions of a software can be substantially reduced by efficient incremental mining. This pr...

متن کامل

CPM Algorithm for Mining Association Rules from Databases of Engineering Design Instances

In this paper, we propose an algorithm for mining associating rules based on transaction combination, attribute combination, pattern comparison and comparative pattern mapping (CPM), aiming at the databases with a large number of attributes but a small number of transactions which are common in engineering design. There are four main steps in the CPM algorithm. First, it scans and expands the d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005